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^ Abstract 

CN 

This paper provides a simple method to estimate both univariate and multivariate 

^ I MA processes. Similar to Durbin's method, it rests on the recursive relation between 

"^>, the parameters of the MA process and those of its AR representation. This recursive 

^^ relation is shown to be valid both for invertible / stable and non invertible / unstable 

processes under the assumption that the process has no constant and started from 

"""Ti zero. This makes the method suitable for unit root processes too. 
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1 Introduction and Summary 

A classic method to estimate the parameters of pm'e MA processes is provided 



by Durbin (1959). This method is a two step estimation. First, an AR process 
I> of large order is estimated using OLS. The parameter estimates are then used 

i~ to estimate the parameters of the MA process using the Yule- Walker estimator, 

Q\ which is shown to be efficient for this purpose by Durbin. 

In this paper, I show that under one assumption, the well know recursive 
formulae to compute the MA representation of an AR process / the AR rep- 



o 

ff^ resentation of an MA process can be derived without assuming stability of the 

. . AR / invertibility of the MA process. Inspecting the recursive formula reveals 

_ ^ that the AR parameters can be viewed as generated by an AR process param- 

^ eterized by the MA. Therefore Durbin's method is basically a Yule- Walker 

j^ estimation of the latter AR(g) process on parameters from the initial AR es- 



timation. It is however well known and documented by e.g. Sandgren and 



Stoica (2006) that the precision of estimates obtained by Durbin's method is 



unsatisfactory if one or more roots of the data generating MA process are close 



to the unit circle. The reason for this is simple: As documented by Tjostheim 



and Paulsen (1983) and others, the Yule- Walker estimator is not well suited 
to estimate AR processes with roots close to the unit circle and biased in case 
one root is on the unit circle. Since the OLS estimator is consistent also for 
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AR processes with unit root, a natural modification of Durbin's method is the 
use of the OLS estimator for the second estimation. But since the Yule- Walker 
estimator is efficient, one faces a trade-off between a lower efficiency for MA 
processes with roots away from the unit circle and more reliable estimates in 
case of one or more roots close to unity. However, by exploiting the structural 
information implied by the recursive formula one can use a restricted OLS 
estimator to counteract this disadvantage. I perform a simulation for the pa- 
rameter space of an MA(2) process which indicates that this approach yields 
substantial gains in efficiency in comparison to Durbin's method for a very 
large part of the investigated parameter space. Finally, a generalization of the 
approach for multivariate MA processes is provided. 

2 The recursive formulae 

2.1 A more general MA representation 

The most important tool throughout the analysis is the so called companion 
form of an AR(p) process. This means that the process is noted as VAR(l): 
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For a given vector of starting values yt-r it is always possible to write the 
process as a function of errors terms, starting values and time: 

yt-T+i =v + Fyt_r + £t-T+i 

yt-r+2 =V + Fv + FFyt_r + F£t_r+1 + £t-T+2 

yt_^+3 =V + Fv + F^V + FF^yt^^ + Y'^tt-r+l + Fet-r+2 + £t-r+2 
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Defining F" = I and using (I - F)-i(I - F)(X;[ro FJ) = (I - F)'i(I - F^) one 
can write this equation in a manner that I call the generalized moving average 
representation: 
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This name seems appropriate since the generalized moving average represen- 
tation contains the textbook moving average representation in its companion 
form version as a special case. This is easy to see in case of distinct eigenval- 
ues of F. It is well known that in this case, F can be decomposed as WAW""^ 



where A is a diagonal matrix of the eigenvalues and W a matrix containing 



the eigenvectors. This allows to write F"^ = WA'^W ^. As shown by Hamilton 



(1994, pp.21) the eigenvalues of F are equivalent to the inverse roots of the lag 



polynomial and hence the process is stable if the norm of all eigenvalues of F 
is smaller than one. Then for r — )• oo A'^ — )• Op where Op denotes a matrix 
of zeros in M^^^. Thus: liniT-^oo E[wt] = (I — F)^^v = \x, which allows to write 
(pl for T = oo and eigenvalues in the unit circle as: 
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The literature rarely approaches AR processes with the companion form but 
uses lag polynomials. However, their use in case one seeks to invert a lag 
polynomial is linked to the assumption of r = oo and stability / invertibility. 
For the important special of a process with v = 0, these assumptions can 
be replaced with a single assumption that might be more sensible in some 
settings: 

Assumption: The "memory" of the process has been emptied at some time in 
the past, meaning yt-r = 0. 

Under this assumption Wt = for any parametrization and ([2| looks quite 
similar to ([s]) with |J. = 0: 

yt = X;FJetH (4) 

However, Q is an MA representation of both stable and unstable AR pro- 
cesses. From here on we shall impose both w = and yt-r = 0, where r might 
be both finite or infinite. 

2.2 The recursive formula for AR processes 

To my best knowledge, the only way the recursive formula has been derived 
previously is by using the lag polynomial of the AR process, that is: assuming 
T = oo and stability. I shall reproduce this approach here for convenience. 
Consider an AR(p) process without constant in lag notion: yt(j){L) = et- Not- 
ing (j){L)~^ = ipiL) gives the identity: 

p oo 

1 =HLmL) = (1 -Y,^JL^)iY.'^^L') 

j=l i=0 

oo oo oo 

1=0 i=0 i=0 

=^0 + (-01 - <PiiPq)L + (02 - (/'iV'l - 02-00)-^^ + ■ • • 

— v{ipp- 4)i^p-i - 4>2tpp-2 - ■■■ (pp'>Po)L^ ■ ■ ■ 
■■■ + (V'p+i - 010P (/.pVi)i^+^ + • • • 

Since 0o = 1 the parameters of the MA representation can be recursively 
computed as V'l = 0i, 02 = 4'i'4'i + 02 and so on. 
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To derive the same recursive formula from ^ one has to compute the 
powers of F, which seems harder than it actually is. Consider F^: 



(5) 



Obviously, the first column contains the first parameters of the MA represen- 
tation. Since all elements of £t_j except the first are zero, one can ignore the 
values in all columns of F-^ except for the first one. Knowing F^ and recalling 
that F-i is generated by the multiplication F x F-'^^ allows to compute F'^ as: 
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where w indicates values that are of no interest for the analysis. Thus, •03 = 
'/'iV'2 + 02^'! + 03 ^iid obviously one can compute F^ and all FJ for j = 2, . . . , r 
in a similar manner. Note that for j > p one gets: 
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Which means that for j > p, the MA parameters are generated by the differ- 
ence equation ipj = Yl'i=i 4>i'^j-i- 



2.3 The recursive formula for MA processes 

Using the Lag operator, a zero mean MA(g) process can be noted as yt = 
'ip{L)et- The corresponding AR(oo) process (j){L)yt = £t can be used in the 
same way as above to gain the identity 

1 =0(L)</.(L) = </.(L) + ^imi) + ■■■+ ti^gimL) 

oo oo 

=(1 - ^ <A,L^') + 0iL(l - Y, <^jL^) + ■■■ 
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^ =(V'i - (l)i)L + (V^2 -(1)2- i)i(t>i)L^ + ■■■ 

V{lpq-(t)q- tpl4>q-l - 'lp24'q-2 1pg-l(pl)L'' . . . 

■■■ + (-</.,+! - i^l^q V'g0l)i^+^ • • • 

• • • + (-(Ag+2 - 010g+l i^q<p2)L'^^'^ + ■■■ 

Thus, </*! = V'l ) V'2 = i;^2 + V'l'/'i s-iid so on. To show that this is indeed the AR 
representation of both invertible and noninvertible MA processes given v = 
and Yt-T = 0, one needs to note Q compatible to the companion form of an 



AR(p) process. Suppose we have T > q observations of yt, know that they 
were generated by an MA(q) process and look for an AR(/) representation of 
this process. To do so, write down the structure of the data generating process 
in a rather unusual way: 
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where again w denotes values of no interest for the analysis and Mi E M . 
To find the recursive formula, use the knowledge that this MA process must 
be the generalized MA representation of the AR process we are interested in. 
Since for the AR process we are looking for Fet_i = Miet_i it is obvious that 
ipi = (j)i. The first column of M2 has been computed as the first column of 
F^ in ([5]). Note that one can also think about the computation of the first 
column of M2 as computing the first column of FMi. This means that one 
can compute the first column of M3 by using the first column of FM2, which 
has the structure: 
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thus, V's = 0i'02 + 02'0i + 03- Continuing with this approach, one finds 
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As soon as ipq is reached, the structure is: 



=0lV'5 H h (pqill + (t)q+l 

=4)2lpq -\ h (kq+llpl + 4>q+2 



This recursive relation resembles to one derived by using the lag-operator and 
assuming invertibility / r = 00. 



3 Improving Durbin's Method 

I just demonstrated that for j > q, the jth parameter of the AR representation 
of an MA(g) process is generated by a g^th order difference equation: 

g 

= (l)j-qtpq -\ h (pj-llpl + (/'j <^ 'i^i = - X] ^«'^i-« 



As noted in the introduction, the method of Durbin (1959) can be viewed 
as estimating this equation, viz estimating an AR on the parameters of the 
initial AR estimate. Note that the recursive formula implies that if j < q, 
the j**^ parameter of the initial AR is not generated by the full difference 
equation. Therefore, using up to the q parameter of the initial AR for the 
second estimation will usually not improve the quality of the estimate since 
these observations are not generated by the process one seeks to estimate. The 
need to drop those estimates makes the usage of the information that 4>i = ipi 
particularly helpful because the first parameters are often the ones estimated 
with the greatest accuracy. 

A classic approach to include stochastic a priori information into OLS 



estimation is the /-class estimator provided by Theil (1963). Note the a prioi 
information as 
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Where e denotes the estimation error from using (pi instead of the true pa- 
rameter. Denoting the variance of e as a'^{(j)i), the /-class estimator for ^\) 
is: 

where X denotes matrix of explanatory variables, y is the vector containing 
the depended variables and o"^ is the variance of the disturbances. As usual, 
one has to replace o"^ and (T^((/)j) with their OLS estimates a"^ and a'^{(j)i). The 
asymptotic variance of the estimates is given by the diagonal elements of 

I ^x'x + r'^^^r) 

Note that this gives a second reason for the exclusion of 0i from X and y: it 
ensures that the assumption of no correlation between e and the innovations 
of the core regression model, which underlies the /-class estimator, is not 
violated. 



4 Some remarks and a simulation 

The recursive formula is helpful to think about the problems concerning MA 
processes with roots in the unit circle. It implies that such a process has an AR 
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representation whose parameters usually explode for p — t- oo. At the same time 
there exists a "sibling" MA process with the same first and second moment and 
all roots outside of the unit circle, implying a non explosive AR representation. 
Requiring Wt = makes sure that the first and second moment are stationary, 
which tends to make an estimator pick the representation with non explosive 
parameters and thus the invertible MA process. This means that there is no 
hope in estimating the parameters of a noninvertible MA process with the help 
of the recursive formula by using some kind of restricted estimator. Since the 
parametrization one tries to estimate is unlikely, the estimator will use the 
remaining degrees of freedom to yield an MA process that is "as invertible as 
possible" given the restrictions and thus biased estimates. 

That said, one should also appreciate the relative scale of the problem. 
Consider an MA process with one root that is just a little smaller than one, 
say 0.98. If a practitioner knows this from theory and is confronted with only 
a few hundred observations, he or she should no spend sleepless nights worry- 
ing about the bias induced by the noninvertibility. The reason is simple: the 
invertible sibling of the MA has the same roots except one that is i/o.98 instead 
of 0.98. Therefore its parameters are often very similar. In fact, estimates for 
an MA process with roots outside the unit circle except one might be as precise 
than the estimates for a process with the smallest root of, say, 1.001. The rea- 
son is as follows: The AR representation of an invertible MA process is de facto 
finite if one takes the precision of a digital computer as zero. However, such an 
AR representation of an MA process with a root of approximately one is very 
very large. This means that faced with a few hundred observations generated 
by a close-to unit root MA process, one can not avoid to specify an insufficient 
number of AR lags which distorts the parameter estimates. A reasonable AR 
representation of the invertible counterpart of an MA process with 0.98 as 
smallest root tends to be smaller and thus induces a smaller truncation bias. 
Therefore, the effects of the smaller truncation bias can counterbalance the 
general bias introduced by estimating the "wrong" process if a noninvertible 
MA process has only one root inside but close to the unit circle. 

To illustrate this point as well as the gains of the using a restriced OLS 
estimator instead of Durbin's original estimator, I simulated both estimation 
procedures in the parameter space of an MA(2) process. I investigate the re- 
gion V'i)^2 G [—2.2,-1-2.2] which is divided into a grid with 221 points and 
partition the simulations for invertible an noninvertible processes. To study 
the gain when estimating invertible processes, all points in the grid that yield 
noninvertible processes are skipped. To study the performance when estimat- 
ing noinvertible processes, points where either no root is in the unit circle or 
where the norm of the smallest root falls short of 0.8 are skipped. 

For each point in the grid that is not skipped, 400 artificial observations 
with standard normal innovations are generated. An AR(IOO) is estimated us- 
ing OLS and these estimates are used to estimate an AR(2) process with the 
Yule- Walker estimator and without dropping any estimate (Durbin's Method) 
and with Theil's /-class estimator using the restriction ^i = 0i while dropping 
(pi and (f)2 from the sample. The squared difference between the true parame- 
ters and estimates are computed and saved, after repeating this procedure 500 



times for the particular point in the grid the mean of all squared differences is 
taken as estimate of the MSE. Figure [T] exhibits the results. 



invertible process, Durbins method 




invertible process, restricted OLS 



noninvertible process, Durbins method 





Figure 1: Simulation for the parameter space of an MA(2) process. The 
lower panel exhibits non invertible processes with roots between 0.8 and 1. 



Note that in figure [T| there are small regions where Durbin's methods 
yields slightly better results than restricted OLS. However, for wide areas of 
the parameter space restricted OLS yields a far smaller MSE. Further note 
that, as stated above, the MSE tends to increase quite moderate outside of 
the invertibility triangle. 

A question of great practical importance is of what order the initial AR 
estimation should be. This question is tedious since it is a trade off: on the 
one hand, a larger order of the AR allows the sample which is used to estimate 
the MA process to be larger and tends to reduce the truncation bias. On the 
other hand, the precision of the estimates is the worse the more parameters 
have to be estimated. There is some fairly elaborate work on the question of 



the optimal trade-off by Broersen (2000). However, in most cases a simple rule 



of thumb works satisfactory well: make the AR process as large as possible 
but avoid less than 4 data points per estimate. 



5 Generalization for Multivariate Processes 

Denoting the parameters of a A:-dimensional MA process as ^j and the param- 
eters of its VAR representation as $i, it is easy to see that exactly the same 
recursive formulae apply for multivariate processes, given the assumption ap- 
plies for each of the time series and the constants are zero. In this case one 
can express <!>„ with m> q as: 
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Which means that, given T » q estimates for the parameters of the VAR, 
the data model for the estimation of a multivariate MA process is 
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It is easy to see that one can divide this into k single standard regression 
models of the kind cf)-,- = — ^li'j + Uj, where ^j is the jth column of $ and 4)^ 
is the jth column of ^, implying that the unrestricted OLS estimate is 
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However, for multivariate processes it is also very useful to exploit the knowl- 
edge that ^1 = $1 + e. Since we estimate k single regression models, one has 
to use a different restriction for each estimation. For estimation j note the a 
prioi information as 




where O is a matrix of zeros in 
estimator for estimation j as 
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This allows to note the /-class 
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Where S(<I>i)j is a matrix of zeros with the variances of the OLS parameter 
estimates for the jth column of $i on its main diagonal. There is no covariance 
as long as one uses the standard OLS estimator for VARs, as described e.g. 
in Liitkepohl (2005 pp 71). This estimator can be obtained by rewriting the 



VAR and transforming it into a standard linear regression model with the 
help of the Vec operator. As pointed out by Liitkepohl this is identical to an 



OLS estimation of the k equations separately. This means that the estimation 
does not provide information about the covariance between the elements in the 
columns of <I>j, we therefore only use the variances. Similar to the univariate 
case, the variances of the MA estimates can be computed using just the left 
part of the estimator. 



6 Conclusion 

This paper provides a simple method to estimate both univariate and mul- 
tivariate MA processes by exploiting the recursive relation between the MA 
process and its AR representation. A simulation study for the parameter space 
of an MA(2) process indicates that the method tends to have a smaller MSE 
than Durbin's method and is relatively robust with respect to unit roots of 
the MA process, in the sense that the MSE increases moderately outside the 
invertibility triangle. 
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